Identification of oxidative stress-related genes differentially expressed in Alzheimer’s disease and construction of a hub gene-based diagnostic model

Alzheimer’s disease (AD) is the most prevalent dementia disorder globally, and there are still no effective interventions for slowing or stopping the underlying pathogenic mechanisms. There is strong evidence implicating neural oxidative stress (OS) and ensuing neuroinflammation in the progressive neurodegeneration observed in the AD brain both during and prior to symptom emergence. Thus, OS-related biomarkers may be valuable for prognosis and provide clues to therapeutic targets during the early presymptomatic phase. In the current study, we gathered brain RNA-seq data of AD patients and matched controls from the Gene Expression Omnibus (GEO) to identify differentially expressed OS-related genes (OSRGs). These OSRGs were analyzed for cellular functions using the Gene Ontology (GO) database and used to construct a weighted gene co-expression network (WGCN) and protein-protein interaction (PPI) network. Receiver operating characteristic (ROC) curves were then constructed to identify network hub genes. A diagnostic model was established based on these hub genes using Least Absolute Shrinkage and Selection Operator (LASSO) and ROC analyses. Immune-related functions were examined by assessing correlations between hub gene expression and immune cell brain infiltration scores. Further, target drugs were predicted using the Drug-Gene Interaction database, while regulatory miRNAs and transcription factors were predicted using miRNet. In total, 156 candidate genes were identified among 11046 differentially expressed genes, 7098 genes in WGCN modules, and 446 OSRGs, and 5 hub genes (MAPK9, FOXO1, BCL2, ETS1, and SP1) were identified by ROC curve analyses. These hub genes were enriched in GO annotations “Alzheimer’s disease pathway,” “Parkinson’s Disease,” “Ribosome,” and “Chronic myeloid leukemia.” In addition, 78 drugs were predicted to target FOXO1, SP1, MAPK9, and BCL2, including fluorouracil, cyclophosphamide, and epirubicin. A hub gene-miRNA regulatory network with 43 miRNAs and hub gene-transcription factor (TF) network with 36 TFs were also generated. These hub genes may serve as biomarkers for AD diagnosis and provide clues to novel potential treatment targets.


Identification of DEGs and module genes. A total of 11046 differentially expressed genes (DEGs)
were found between AD and control samples from the GEO GSE132903 dataset, of which 5805 were upregulated and 5241 downregulated ( In weighted gene co-expression network cluster analysis, the gene sets of all samples were included in the dendrogram (i.e., there were no outliers) (Figs. 3, 4). The soft threshold was set at 17 (R2 = 0.85) to construct a scale-free network (Fig. 5), and 20 modules were identified (Fig. 6). Based on the criteria (|Cor| > 0.4), blue, turquoise, yellow, and black modules, collectively including 7098 genes, were selected for further analysis (Fig. 7).
With two types of AD, early-onset and late-onset, both associated with autosomal dominant or sporadic inheritance 18 , studying chromosomal location can help researchers gain a deeper insight into genetic factors. The visualization of gene expression patterns and chromosomal locations of candidate hub genes in AD and normal controls were exhibited in Figs. 8b and 9, suggesting that these genes were distributed in all chromosomes except for chromosome Y, and there were most genes in chromosome 1.
Identification of hub genes by co-expression analysis. To identify hub genes (those most strongly interacting with the greatest number of other DEGs), a PPI network was constructed using the STRING website , and EDN1 (on chromosome 6)) (Figs. 8b, 11b). We further validated these hub genes with screening thresholds p. adj < 0.01 and abs(logFC) > 0.585 19 , and the results demonstrate the reliability of these hub genes in terms of statistical significance (See Table 6 in the Supplementary Material). Of these, MAPK3 and MAPK1 were the most strongly connected to the rest of the network with connectivity scores of 15, followed by SP1 (10) and CASP3 (10) (See Fig. S1 in the Supplementary Material). All 11 candidate hub genes were expressed at different levels in AD and normal samples of the GSE132903 dataset, with APP, MAPK1, MAPK9, and MAPT expression levels higher in control samples and the rest higher in AD samples (Fig. 12). The capacity of each hub gene to distinguish AD from control samples according to expression level was then examined by ROC curve analysis, and the seven candidates with best discrimination performance (area under the ROC curve > 0.7), namely SP1, CASP3, MAPT, BCL2, FOXO1, ETS1, and MAPK9, were selected for further analyses (Fig. 13a).
Of these seven candidate hub genes in the GSE118553 dataset, all except MAPT were significantly differentially expressed between AD and normal samples, with CASP3 and MAPK9 expressed at higher levels in normal samples and SP1, BCL2, FOXO1, and ETS1 expressed at higher levels in the AD samples (see Fig. S5). The AUCs of the ROC curves for SP1, CASP3, BCL2, and MAPK9 were greater than 0.7 (see Fig. S6 in the supplementary material). Moreover, SP1, BCL2, FOXO1, ETS1, and MAPK9 were differentially expressed between AD and control samples from the GSE5281 dataset (see Fig. S7 in the Supplementary Material). Also, the AUCs of the ROC curves for SP1, BCL2, FOXO1, ETS1, and MAPK9 were greater than 0.7 (See Fig. S8 in the Supplementary Material). Based on these analyses, MAPK9, FOXO1, BCL2, ETS1, and SP1 were identified as hub genes. Coexpression analysis revealed that BCL2 and FOXO1 exhibited the strongest positive correlation, while MAPK9 and SP1 demonstrated the strongest negative correlation (Fig. 13b-d).

Correlations between immune cell infiltration scores and hub gene expression levels.
To examine the effects of hub gene expression on neuroinflammation, we assessed associations with the brain infil-      www.nature.com/scientificreports/ tration of specific immune cell types. Expression of BCL2 was significantly correlated with the infiltration scores of 24 immune cell types (all considered except for Type 2 T helper cells, regulatory T cells, eosinophils, and central memory CD4 T cells). The correlations were positive with T follicular helper cell, natural killer cell, natural killer T cell, and effector memory CD8 T cell infiltration scores, and negative with effector memory CD4 T cell and Activated CD4 T cell infiltration scores (Fig. 14a). There was no significant correlation between ETS1 expression and either Type 2 T helper cell or activated CD4 T cell infiltration scores. ETS1 expression was negatively correlated only with effector memory CD4 T cell infiltration score, while all other correlations were positive (Fig. 14b).   Gene set enrichment analysis. Gene set enrichment analysis (GSEA) was then conducted to explore the molecular functions of these hub genes. All five hub genes are included in the "Alzheimer's disease pathway' (i.e., the Alzheimer's disease pathway is overrepresented in the hub gene list), with MAPK9 downregulated and the remaining four upregulated. However, all five genes are also included in the list of genes implicated in "Parkinson's disease' and high expression levels of BCL2, ETS1, and FOXO1 are associated with "ribosome, " "chronic myeloid leukemia, " and "small cell lung cancer' (Fig. 15a-e).
Drugs, genes, and transcription factors regulating hub genes. 78 target drugs were predicted by FOXO1, SP1, MAPK9 and BCL2 ( Fig. 16a and see Table 3 in the Supplementary material). For instance, there  Table 4 in the Supplementary Material) and 36 TFs. For example, GATA2 and NFIC are TFs controlling SP1, ETS1, and MAPK9 expression ( Fig. 16c and Table 5 in the Supplementary Material).
Diagnostic model based on 5 hub genes. A diagnostic model was then constructed based on expression levels of the five hub genes using LASSO regression and ROC analyses. For the optimal model developed using the training dataset, lambda.min = 0.00071235 (Fig. 17), and the coefficients of the five genes were BCL2 = −1.9400531 , ETS1 = 1.3313010, FOXO1 = 3.6540838, MAPK9 = 0.6012298, and SP1 = 1.5144388. Moreover, the confusion matrix showed that the specificity of the model reached 0.74, accuracy 0.75, and recall 0.76 for the training set (Fig. 18a). Further, the AUC of the ROC curve was 0.818, indicating good discrimination performance (Fig. 18b).
Model performance was then validated using the GEO dataset GSE5281 (Fig. 18c). The confusion matrix showed that the model achieved similar precision (0.94), accuracy (0.786), and recall (0.75) on this dataset. The AUC of the ROC curve was also similar (0.88), underscoring the general utility of the diagnostic model (Fig. 18d).

Discussion
A prominent early feature of AD is the accumulation of OS markers [20][21][22] , which results from an imbalance between endogenous ROS generation and cellular antioxidant capacity. The primary endogenous sources of ROS, such as mitochondrial respiration and various enzymatic reactions, and the physiological mechanisms regulating endogenous antioxidants are well described. Moreover, many exogenous factors are known to directly scavenge ROS and/or enhance endogenous antioxidant capacity. Therefore, limiting OS and associated damage may be a particularly promising area of AD drug research and development 4 . Here we identified numerous potential OS markers by comparing gene expression profiles between patients and matched controls. X chromosome aberrations have been confirmed by previous studies 23 to be highly associated with oxidative stress and could be a breakthrough for future disease-related investigations. Meanwhile, mutations in three different genes on chromosomes 1, 14, and 21 were reported to result in autosomal dominant forms of familial Alzheimer's disease, that is 24 , the APP gene on chromosome 21, the PS1 gene on chromosome 14, and the PS2 gene onchromosome 1 23,25 . Based on the visualization of the chromosomal location of candidate hub genes (Figs. 8b and 9), it was showed that the AD-related differential genes identified were distributed on these crutial chromosomes, with the most differential candidates on chromosome 1. Meanwhile, chromosome 6 locus and chromosome 12 are highly associated with late-onset Alzheimer's disease 26 , while AD is also highly neuropathologically similar to familial British dementia and familial Danish dementia, which are highly associated with chromosome 13 27 . It further indicates that these chromosomes are significantly associated with AD development. www.nature.com/scientificreports/ Then we defined five major hub genes of a network associated with oxidative stress in AD patients. These detected hub genes included the stress-associated signaling factor MAPK9, the negative regulator of apoptosis BCL2, and three transcription factors, FOXO1, ETS1, and SP1, associated with various stress and protective responses. A diagnostic model based on the expression levels of these five genes distinguished patients from controls with high sensitivity and selectivity.  www.nature.com/scientificreports/ Expression of MAPK9 was consistently higher in normal samples while the expression levels of FOXO1, BCL2, ETS1, and SP1 were higher in AD samples (See Fig. S5 in the Supplementary Material). Construction of hub gene-miRNA and hub gene-TF networks identified multiple regulatory pathways contributing to these differences in expression, and many of the interactions of DEGs with TFs and miRNAs we identified were demonstrated experimentally in past studies [28][29][30][31][32] . These DEGs and upstream TFs are involved in a variety of biological processes, such as the immune response, apoptosis, and vascular morphogenesis. Previous studies have strongly implicated BCL2 and FOXO1 in neuronal survival, while FOXO and MAPK signaling pathways are known as important regulators of the cell cycle and autophagy/apoptosis 33 . Previous studies have also reported BCL2 overexpression in the fontal cortex of AD patients and suggested that this reflects a compensatory anti-apoptotic response that is stronger in frontal cortex than in other brain regions [34][35][36] . There is also compelling evidence for a relationship between dysregulated BCL2 expression and impaired learning and memory. High FOXO1 expression may be linked to age-related neurodegeneration through regulation of advanced glycation end product (AGE) signaling pathways, cellular senescence mechanisms, autophagic pathways, and apoptotic pathways induced by accumulation of misfolded proteins or cellular damage 37,38 . Elevated ETS1 expression may be associated with microglia activation and proliferation, a critical initiating event in neuroinflammation 39,40 . The transcription factor SP1 is strongly implicated in the regulation of OS response genes, and notably also in the expression of amyloid precursor protein (APP) and tau protein. Moreover, SP1 has been found to be significantly upregulated in the frontal cortex of AD patients 30,41,41,42 . Finally, MAPK9 (also known as JNK2) plays an important role in the oxidative stress response 43 , and many studies have confirmed that JNK signaling is associated with aging 44,45 .
A diagnostic model constructed based on the expression levels of these five genes (MAPK9, BCL2, FOXO1, ETS1, and SP1) using LASSO regression and ROC analysis demonstrated high accuracy for distinguishing patients from matched controls in both training and validation datasets. Therefore, this model may have clinical value for the early diagnosis of AD. BCL2 is a primary negative regulator of apoptosis that may be induced in response to stressors such as DNA damage. In contrast, however, neurons with neurofibrillary tangles (NTs), a hallmark of AD, exhibit downregulated BCL2 expression and thus potentially greater susceptibility to apoptosis 46 . Recent research has confirmed that BCL2 also regulates intracellular Ca 2 + signaling by influencing mitochondrial-and endoplasmic reticulum-based mechanisms for Ca2 homeostasis, and aberrant neuronal Ca 2 + signaling has long been implicated in AD as well as other neurodegenerative disorders. Therefore, anti-apoptotic BCL2 and its derived peptides may have potential for therapeutic prevention of AD pathogenesis 47 . ETS1 is a proangiogenic transcription factor, and enhanced intravascular immunoreactivity as well as extravasation have been documented in the cortical microvasculature of AD brain. The expression of ETS1 is also strongly correlated with expression of tumor necrosis factor-α (TNF-α), an important pro-inflammatory cytokine released by reactive microglia (ref). Therefore the expression pattern of ETS1 in the brain of AD patients is likely of great importance to disease progression 39 . The FOXO transcription factors regulate the balance between cell longevity and tumor suppression by controlling the expression of proteins involved in cell cycle arrest, stress resistance, and apoptosis. Therefore, among these downstream proteins are numerous potential molecular targets for the treatment of AD 48 . The c-Jun N-terminal kinase (JNK) signaling cascade is triggered in response to a broad spectrum of stressors 49 . Of particular relevance to AD, JNK-MAPK proteins are involved in the regulation of apoptosis during oxidative damage through phosphorylation of effector proteins or transcription factors, which may lead to dysregulation of tau phosphorylation and accumulation of hyperphosphorylated tau, the major component of NTs 50,51 . The transcription factor SP1 also responds to inflammatory signals, such as those induced by neuronal damage. Elevated www.nature.com/scientificreports/ expression of SP1 in AD samples may be a protective response, so it would be intriguing to investigate whether regulation of SP1 by drugs such as mithramycin can improve AD symptoms such as memory dysfunction 52 . The known functions of these hub genes are also consistent with the strong correlations with immune infiltration scores and enrichment analyses using GO and DEGG databases. The identification of these hub genes may thus facilitate the development of novel disease-modifying therapies as well as symptom-targeted drugs.
In this study, we firstly identified five hub genes that target oxidative stress (OS) process in Alzheimer's disease (AD) through bioinformatic analysis, which were then examined for biological processes and pathways, associations with brain infiltration of specific immune cells, and for their capacity as a diagnostic model to distinguish patients from controls. While further experimental validation is required to verify OS-and AD-related functions, the high performance of the diagnostic model and previous investigations strongly implicate these five hub genes, their downstream targets for transcriptional regulation, and upstream regulators (including numerous miRNAs) in the OS response in AD. It is worth that 78 drugs identified that target these hub genes have been not proven to mitigate AD, and no EST1-modulating drugs were detected, suggesting that testing of these agents in AD animal models and development of small-molecule EST1 regulators are potentially productive areas for future research. The distribution of candidate hub genes on chromosomes may provide insights for future studies on www.nature.com/scientificreports/ potential pathogenic mechanisms underlying potential pathogenesis on other chromosomes. Also, it is vital to supplement this model to distinguish AD from Parkinson's disease.

Methods
Data sources. Datasets GSE132903, GSE118553, and GSE5281 were obtained from the National Center for Biotechnology Information database (https:// www. ncbi. nlm. nih. gov/). The GSE132903 dataset contains expression profiles of middle temporal gyrus from 97 Alzheimer's disease patients and 98 non-demented controls, GSE118553 includes transcriptomic data of brain tissues from 52 AD patients and 31 controls, and GSE5281 comprises gene expression data of brain samples from 16 AD patients and 11 controls. Moreover, 446 oxidative stress-related genes (OSRGs) were obtained from the Gene Ontology (GO) database (Data cohort: GO:0006979). The minimum age of study subjects in these datasets is 70 years.
Identification and functional enrichment analysis of candidate genes influencing the oxidative stress response in Alzheimer's disease. The Limma package (version 3.50.1) was used to perform differential expression analysis between AD and control samples of the GSE132903 dataset. In this study, since the training set is microarray data with overall small logFC values, the screening threshold of DEGs was adjusted p value (p. adj) < 0.05 53 . Considering that a large number of genes were available in the original dataset, the adjusted p-values provided a good balance between finding statistically significant genes and limiting the occurrence of false positives by implementing a Benjamini-Hochberg (BH) correction 54 to control the false discovery rate of the entire significant genes, and considered for multiple testing. Further, the traits of AD patients and normal controls of the GSE132903 dataset were including in WGCNA analysis (WGCNA version 1.70-3) 55 to define gene modules related to AD. Briefly, samples were first clustered to determine whether there were outliers that needed to be removed. Then, the optimal soft threshold was defined to ensure that gene interactions maximally conformed to a scale-free distribution. Based on the optimal soft threshold, a gene dendrogram was plotted using the dynamic tree cutting algorithm. After calculating the correlations between each module and traits, the modules with |Cor| > 0.4 were selected as key modules and the component genes were included as module genes for subsequent analyses. Candidate genes potentially influencing the OS response in AD were then identified by the overlap among DEGs, module genes, and OSRGs. The GO and KEGG enrichment analyses of candidate genes were performed using clusterProfiler (version 4.2.2), with a significance criterion of p < 0.05. Furthermore, genomic location distributions and expression levels of the candidate genes were visualized using OmicCircos (version 1.32.0).

Identification and correlation analysis of hub genes.
A PPI network of the hub candidate genes was constructed using the STRING website, and core network genes selected using the MCODE plug-in of Cytoscape (version 3.8.2). The genes in the core network were then evaluated for distinguishing AD from normal samples using ROC curve analysis, and those with AUC > 0.7 were selected. These candidate hub genes were further verified by ROC analysis using GSE118553 and GSE5281 dataset samples. Candidates showing reliable group classification performance in all datasets were selected as the final hub genes. Subsequently, the correlation in expression level between each hub gene pair was calculated and scatter plots drawn for the gene pairs with strongest positive and negative correlations.
Immune infiltration analysis and GSEA. Neuroinflammation is a major driver of AD progression, so we examined if these hub genes were associated with the infiltration of various immune cells 56 . Correlations with the infiltration of 28 immune cell types in samples from GSE132903 were estimated by single-sample (ss) GSEA using GSVA (version 1.42.1). To further explore the functions of these hub genes, samples with high and low expression were analyzed using The Molecular Signatures Database (MSigDB). Among the significantly enriched KEGG pathways (p. adj < 0.05), the top five with normalized enrichment score (NES) > 0 and NES < 0 were selected for presentation.

Regulatory networks and target drugs of hub genes. The miRNAs and transcription factors (TFs)
regulating hub genes were predicted using the miRNET database 57 . To improve the clarity of network diagrams, only miRNAs targeting at least three hub genes were screened and visualized. Additionally, drugs influencing expression of hub genes were predicted using the Drug-Gene Interaction database (DGIdb) 58 , which includes drug-gene interaction data from 30 sources.
Construction of a diagnostic model. A diagnostic model based on expression levels of the five hub genes was then established using GSE132903 as the training set. The expression levels of all hub genes were extracted and subjected to LASSO regression analysis to derive coefficients for diagnostic model construction. To evaluate the effectiveness of the diagnostic model, all samples in GSE132903 were categorized as disease or control, and the effectiveness of this categorization verified by plotting a confusion matrix and ROC curves. In addition, these analyses were repeated using GSE5281 as an external validation set.